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Abstract. We revisit the problem of model-based object recognition 
for intensity images and attempt to address some of the shortcomings of 
existing Bayesian methods, such as unsuitable priors and the treatment 
of residuals with a non-robust error norm. We do so by using a refor- 
mulation of the Huber metric and carefully chosen prior distributions. 
Our proposed method is invariant to 2-dimensional afiine transforma- 
tions and, because it is relatively easy to train and use, it is suited for 
general object matching problems. 

1 Introduction 

In this paper we will examine the view-oriented case for model-based object 
recognition, in which 2-dimensional representations of 3-dimensional objects are 
used, called aspects or characteristic views Such methods have recently be- 
come quite popular because of their applicability in many areas and their ease 
of implementation, since they avoid storing and reconstructing a full 3d model. 
In addition, there is evidence to suggest that view-oriented representations are 
used by the human visual system for object recognition [5]. The view-oriented 
object recognition problem for a single view can be formulated as follows: 

Definition 1. Suppose that we have a prototype template function Fq, an image 
function I and a transformation T that transforms the template as F — TFq. 
The goal of object recognition is to minimise the expression: 



with respect to the transformation parameters S,, where g{., .) is an error met- 
ric and R the parameter space. If the minimum is less than or equal to some 
threshold t, then we have a match. 

The main problem that arises from this formulation is the determination of 
the parameters ^ that minimise the above expression. Solving for ^ depends on 
the transformation T. For complicated transformations T, the optimisation is a 
nonlinear process and the minimum is found using an iterative algorithm. 
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1.1 Our Approach and Related Work 

We have based our approach on previous popular work by [5] and [6]. First 
Grenander et al. proposed a general deformable template model, by represent- 
ing deformations of the template as probabilistic transformations, for Bayesian 
inference on contour shape. Jain et al. used this approach together with a snake- 
like potential function to influence the template toward edge positions in the 
image. A similar scheme has been used by Cootes et al. [TU], where the template 
is represented by the mean shape of a training set and a linear combination of 
the most important eigenmodes of the variation from the mean. The Bayesian 
object localisation method introduced by Sullivan et al. 'F is another interesting 
approach. Distributions of the template over the foreground and background are 
learned from training images, and used as the likelihood in a Bayesian inference 
scheme. 

In our approach we use intensity information, without the need to extract 
features from the image. Also, we use novel distributions for the prior and do 
not assume that all transformations are equally possible. This disallows trivial 
solutions of the transformation parameters. Finally, our likelihood function is 
based on a more robust error metric that currently tends to one distribution 
when the template overlaps with an object (foreground) and to another when 
the template is on the background. A Bayesian formulation, that combines this 
prior knowledge together with information from the input image, the likelihood, 
is used in order to find a match between the image and the template. This 
combination is realised in the posterior probability, a maximum of which may 
indicate a possible match. 

2 Deformation Model 

The deformation model we propose consists of a prototype model template of 
the representative shape of an object, a selection of parametric transformations 
that act on the template, and a set of constraints that bias the choices of possible 
deformation parameters. 

2.1 Prototype Template Representation 

The prototype template consists of the pixels (grey levels) within a (for conve- 
nience) rectangular boundary, chosen as a representative example of an object 
or object class. The prototype is based on our prior knowledge about the objects 
of interest, and is usually obtained from training samples. Such training could 
be based on Principal Components Analysis (PCA), shape alignment, or the 
prototype could simply be the mean shape of the class. Unlike other methods, 
our model is not parameterised, but instead the transformation is. The model we 
are using contains grey level and boundary information in the form of a bitmap, 
and thus is appropriate for general object recognition tasks, since in order to 
apply the same method to a different class of objects we only need to generate 
a new prototype image of this class. 



2.2 Parametric Transformations 



Although the prototype template represents the most likely a-priori instance 
of the object, we still need to deform it to match the image. The parametric 
transformations consist of a global affine transformation A, and a local defor- 
mation D. It is necessary to compose A as a product of individual meaningful 
transformations (primitive matrices). Such a composition is not unique, but by 
adopting a canonical order for the transformations, we could say, for example: 
A = SRUx + d where S is an anisotropic scale matrix, i? is a rotation matrix, 
Ux an angular shear matrix on the x-axis, and d = {dx,dy) a translation vector. 

The local deformation _D is a 2d continuous mapping (x, y) — (a;, y) + 
[Dx{x, y), Dy{x, y)], defined as a simple sinusoidal function: 

D^{x,y) = [Dx{x,y),Dy{x,y)] = [acos{2iTkxA),l3cos{2TTkyA)] , (2) 

where ip — (a, l3,kx,ky,xo,yo) are the deformation parameters, with a, (3 being 
the wave amplitudes, k^, ky the wavenumbers, and A — [x — cco)^ + (y ~ 2/o)^ 
is the Euclidean distance from the centre point (a;o:yo)- We thus suppose that 
we have a prototype template function Fo{x^y) and a transformation T that 
transforms the template as follows: 

TFo{x,y) ^ FoiSRUxix,y) + D^{x,y) + {dx,dy)) . (3) 

This is the parametric transformation that will deform the template to match 
the image. This transformation is realised by shearing the template by angle 
(f, then rotating by an angle scaling the result by Sx, Sy along directions x 
and y respectively, locally deforming the resulting template by ip and finally a 
translation by d. 

2.3 Probabilistic Constraints 

Since not all choices of transformation parameters will produce a template that 
resembles the object(s) in the image, it is necessary to restrict their variability. 
We do so by imposing a probability density function (p.d.f) on the transforma- 
tion T. 

Consider the local deformation D^{x,y) first. We have chosen uniform dis- 
tributions for the wave centre parameters xo,yQ, since any centre point has an 
equal probability of producing a valid sinusoid. We further assume that the two 
sinusoids in ^ have amplitudes a and (3 that are independently and identically 
normally distributed with zero mean and variance ct^^. For the wavenumbers 
kx and ky, we also assume zero mean, independent and identical normal dis- 
tributions with variance w'^. This results in a prior distribution for the shape 
parameters ip: 



that favours small deformations of the object in preference to large ones. 

For the rotation and translation, we can assume that all rotations and trans- 
lations are equally possible and thus we can consider their parameters c? as 
being uniformly distributed. However, the scale and shear transformations re- 
quire a different approach, and special care is required for choosing their p.d.f.s. 
The reason for this comes from the behaviour of the error function ([T]) , for certain 
values or ranges of values of the parameters s = (s^^Sy) and ip. More specifi- 
cally, if one or both of the scale parameters are very small, F(x^ y) will collapse 
into a single point or a line respectively. This of course is not going to be a 
valid representation for the template but the error function will undoubtedly 
have a minimum for these values of the scale parameters. Such trivial solutions 
should not be allowed. Similar behaviour occurs with the shear angle tp, which 
for ip — will collapse the object into a line. 

To avoid these problems, we need to forbid such values for the scale and shear 
parameters. To do so, we define a prior for these parameters that will bias them 
away from such values. A good choice for the scale parameters Sx and Sy is the 
inverse Gaussian (Wald) distribution which, if we assume that and Sy 
are independent, that their mean scale s is 1, and that their scale parameter is 
(7s, leads to: 



(Ts exp 

Pr{s) = ^—^ ^ . (5) 

The Wald distribution is ideal because it assigns very low probability to quantiles 
close to zero, while it allows us to determine the probability of large values of 
the scale parameter s by adjusting the tail of the p.d.f.. For the shear angle, 
we would like to introduce a bias in favour of small deformations, and to rule 
out the values (/? = Furthermore, when the mean shear angle is zero, the 
distribution must be symmetric. On the other hand, if the mean angle is close 
to — -I then the distribution for negative values must fall sharply, whilst the 
distribution for high values must exhibit similar behaviour when the mean angle 
is close to (but not quite) ^. We have therefore chosen a mixture model of two 
Gumbel distributions [11 , with; 

Pr{(p) = , (6) 

where b is the shape parameter and A — '^^ ^ . Since the individual transforma- 
tion parameters were assumed independent, the total prior p.d.f. Pr{£_), is the 
product of the individual p.d.f.s (HI),® and ([6]). 



3 Objective Function 



Two commonly used metrics in template matching applications are the L2 met- 
ric and the Li metric which are valid from a maximum likelihood perspective, if 



the error residuals are normally distributed or exponentially distributed respec- 
tively. However, [7] have shown that additive noise in real images is generally 
not normally distributed, and the majority of the variation comes from illumina- 
tion changes and in-class object variation. In addition, ^ have shown that when 
using an error metric (such as the L2) and considering only the portion of the 
image under the template, then the observations / are a function of the hypoth- 
esis ^. That is not valid in a Bayesian framework, since / should be considered as 
fixed. In 8] a learning process is therefore used to model the different foreground 
and background distributions. Here, we use a simple parametric distribution to 
interpolate between the foreground and background behaviour. Since, in general 
we know little about the latter it should be based on a robust statistic. The Li 
metric, although robust, is singular when the residual goes to zero, and makes 
the optimisation process difficult. For this reason, we have chosen as a metric, a 
reformulation of the Huber norm [9 . This smooth Huber norm, is continuous 
and defined as: 



g.(x) = + , (7) 

where r is the threshold between the Lj^ and L2 norms. The smooth Huber norm 
treats residuals close to zero (template over the foreground) with the L2 norm 
and large residuals (template over the background) with the Li norm. By using 
equations (m, ([3]) and ([7]) we obtain the combined objective function 5* which 
needs to be minimized: 




minSM^I ^^i,+ iHu + x,v + y)-TFo(x,y)f ^^^^ _ 

If we reformulate ([S]) as a p.d.f we see that the likelihood of observing the input 
image given the deformations on the prototype template is: 

Pr(/|0 = Ciexp{-5(7.,i;)} , (9) 

where Ci is a normalising constant, equal to l/2(ei4'i(l)r) where e is the expo- 
nential and Ki is a modified Bessel function. 

Finally, we may use the fact that Pr(^|/) cx Pr{I\^)Pr{^) and combine 
equations dU, ([1]), (O and © to obtain the posterior p.d.f. of the parameters 
given an image /. The parameters may therefore be obtained by minimising the 
corresponding negative log-likelihood which for example, if the mean shear angle 
^ in ([6]) is zero, is given by: 

kl -f kl 



mm{-logPr(e|/)} =log(^s3s3) -log I " J-^-^^" 1 -u _E v. 

+ + + — + Sy- 4^ + a_^J_ ^ ^^^^ ^ ^^^^ 



where ^ = (s^., Sy, ip, kx,ky, a, /3, xo,yo,dx,dy, d) are the transformation parame- 
ters. Note that the distribution shape parameters b, w, Cs, aap and the threshold 
T are treated as constant. 



4 Experimental Results 



We have experimented with greyscale images of faces, such as those shown in 
Fig. [1] and [5] . First, we present the effects of an appropriately chosen prior on 
the error function. In this example, we have isolated the scale space by choosing 
a rectangular template (the female face on the bottom right of the picture) and 
varying the scale parameters Sx,Sy while keeping all other parameters constant. 
The resulting sum of square differences error (normalised to a value of 1) can be 
seen on the top-right of Fig. [T] Note, that the desired solution is at Sx=Sy=l and 
trivial solutions are located at values of either of the parameters s close to zero. 
If we now choose a Wald prior, with a peak at (1, 1) (bottom-left), and calculate 
the inverse log-probability, we get the surface on the bottom-right. The trivial 
solutions have now become maxima, and the global minimum is at the desired 
solution (1,1). Compared to the original function, the log-posterior surface is 
convex with a very large basin of attraction. We also show some optimisation 




Fig. 1. The test image and template {top left) and the error function for the 
scale parameters {top right). The desired solution is at s^: = Sj^ = 1. The chosen 
Wald prior is illustrated {bottom left) and the resulting negative log-posterior 
probability {bottom right). The desired solution remains at the same position 
but without the trivial solutions 

results, where a template is taken from the image (Fig. [2]), and is randomly afhne 
transformed and locally deformed. We then use numerical optimisation to match 



the deformed template to the original image and see if we can find the correct 
parameters of the transformation. The template is placed on the image (Fig. ^ 
left) and an exhaustive search is used on the translation parameters dx,dy, in 
order to find a good starting location for the optimisation algorithm. Using, for 
simplicity, a variation of the Simplex algorithm [12] , we minimise the parameters 
^ and obtain the resulting template which is superimposed on the right image of 
Fig. [21 Visually, the results are quite pleasing, with the affine parameters being 
correctly identified within an appropriate error deviation (see Table [1]) . 

5 Conclusions and Future Work 




Fig. 2. A randomly transformed template is placed on the image (left), and 
by means of numerical optimisation we find the parameters for which the log- 
posterior has a minimum value. The results can be seen on the (right) 



Wc have presented a robust treatment of the view-oriented object recognition 
problem for intensity images under a Bayesian formulation. We have introduced 
prior distributions to bias appropriately a template which is deforming under 
affine transformation and a sinusoidal geometric deformation. Also, we have ad- 
dressed the problem of different distributions of the foreground and background 
by using the robust smooth Huber metric. Some preliminary results obtained 
with our methods were presented. 

There are many issues that we would like to examine in future work. In par- 
ticular, we have only discussed grey-level imagery. Extension to colour imagery 
is needed. In addition, we would like to experiment with other metrics, more 
closely related to what is known about the statistics of images of natural and 
man-made scenes [13j . We would also like to experiment with explicit modeling 
of the foreground and background distributions from training samples, using a 
statistical mixture model. Finally, in this early stage of our work, we have not 
discriminated between intrinsic variations of the template, that is variations of 
the shape of the object that depend only on the properties of the object and ex- 



Table 1. Comparison between actual and estimated values of the transformation 
parameters from Fig. [2] 



Transformation Actual Estimated Absolute deviation 

Rotation {■&) 30.47° 29.7046° 0.7654° 

Translation (do;, dy) 211,37 213,38 2,1 

Scale (s^, Sy) 1.3077, 1.1923 1.3125, 1.2719 0.0048, 0.0796 

Shear {<p) 27° 24.6776° 2.3224° 

Sinusoid (q, fc) 1.96,0.0327 0.0032,0.0069 1.9568,0.0258 



trinsic variation which may depend on the viewpoint (141 . We hope to introduce 
models for the extrinsic, viewpoint variations in the future. 
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